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Abstract 

y  The  formulation  of  the  decision  making  process  of  a  failure  detection  algorithm  as 
a  Bayes  sequential  decision  problem  provides  a  simple  conceptualization  of  the 
decision  rule  design  problem.  As  the  optimal  Bayes  rule  is  not  computable,  a 
methodology  that  is  based  on  the  Bayesian  approach  and  aimed  at  a  reduced 
computational  requirement  is  developed  for  designing  suboptimal  rules.  A  numerical 
algorithm  is  constructed  to  facilitate  the  design  and  performance  evaluation  of  these 
suboptimal  rules.  The  result  of  applying  this  design  methodology  to  an  example 
shows  that  this  approach  is  potentially  a  useful  one. 
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1.  INTRODUCTION 


A  failure  detection  and  identification  (FDI)  process  consists  of  two  basic  stages  : 
residual  generation  and  decision  making.  In  the  first  stage,  sensor  outputs  are 
processed  to  form  residuals  that  typically  have  distinct  characteristics  under  normal 
(no-fail)  conditions  and  under  the  various  possible  failures  modes.  (See  [1]  for  a 
discussion  of  the  design  of  residual  generation  processes.)  The  function  of  the  second 
stage  is  to  monitor  the  residuals  and  make  decisions  concerning  the  occurrence  and 
identity  of  failure  modes.  The  decision  mechanism  is  based  on  a  compromise  among 
speed  of  detection,  false  alarm  rates,  and  identification  accuracy,  and  it  belongs  to  the 
extensively  studied  class  of  sequential  tests  or  sequential  decision  rules  12-1  Si.  Most 
previous  works,  however,  were  focussed  on  either  the  detection  of  a  single  type  of 
change  (failure)  [5-91,  or  the  sequential  testing  of  M  hypothesis,  which  is  analogous  to 
the  problem  of  identifying  the  failure  mode  given  the  onset  time  is  known  [12-14].  In 
this  paper,  we  employ  the  Bayesian  approach  to  the  design  of  decision  rules  that 
directly  confront  the  problem  of  detecting  and  distinguishing  the  various  possible  failure 
modes  which  may  occur  at  unknown  times. 

In  Section  2  we  describe  the  Bayes  formulation  of  the  FDI  decision  problem. 
Although  the  optimal  rule  is  generally  not  computable  the  structure  of  the  Bayesian 
approach  can  be  used  to  derive  practical  suboptimal  rules.  The  design  of  suboptimal 
rules  based  on  the  Bayes  formulation  is  discussed  in  Section  3.  The  approximations 
and  simplifications  that  are  made  in  order  to  obtain  these  rules  make  systematic  use  of 

c 

the  important  features  specific  to  the  problem  of  dynamic  failure  detection  and 
consequently  allow  us  to  interpret  each  step  in  our  simplification  procedure  in  terms  of 


its  implications  for  failure  detection.  In  Section  4  we  report  on  our  experience  with 
this  approach  to  designing  decision  rules  through  a  numerical  example  and  simulation. 
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1  THIS  BA  TBS  IAN  APPROACH 

In  this  section  we  adapt  and  specialize  the  standard  Bayes  Sequential  Decision 
Problem  (BSDP)  [IS]  to  the  problem  of  failure  detection.  The  BSDP  formulation  of 
the  FDI  problem  consists  of  six  elements: 

1.  6  :  the  set  of  states  of  nature  or  failure  hypotheses.  For  simplicity  in  this 
development  we  assume  that  only  single  failures  may  occur.  In  general  an 
element  0  of  9  conveys  several  pieces  of  information,  namely,  the  type  of  failure 
mode,  its  time  of  occurrence,  and  probably  a  variable  specifying  the  severity  of 
the  failure.  For  example,  if  a  particular  failure  mode  corresponds  to  the  onset  of 
a  sensor  bias,  the  level  of  this  bias  could  be  specified  in  the  corresponding 
element  of  0.  In  many  applications,  however,  it  suffices  simply  to  identify  the 
failure  type  without  estimating  its  severity.  Furthermore,  what  is  often  done  to 
eliminate  this  nuisance  parameter  completely  is  to  hypothesize  a  fixed  scale  for 
each  failure  type  corresponding  to  the  smallest  deviation  from  normal  behavior 
that  one  would  like  to  detect.  For  example,  this  approach  was  used  with  great 
success  for  the  detection  of  aircraft  sensor  failure  in  [16].  We  will  adapt  this 
approach  here,  and  consequently  elements  of  0  are  2-tuples,  0  -  (i,r) , 
corresponding  to  the  onset  of  the  ith  failure  mode  at  time  r.  We  assume  that 
there  are  M  hypothsized  failure  modes  and  also  denote  by  (0,-)  that  element  of  9 
corresponding  to  no  failure.  Thus, 

9-{(i,r),  i-l,...M,  t-1,2,...}  U  {(0,-)} 

2.  p  :  the  prior  probability  mass  function  (PMF)  over  the  nature  set  9.  This  PMF 


represents  the  a  priori  information  concerning  possible  failures,  i.e.  how  likely  it  is 
for  each  type  of  failure  to  occur,  and  when  is  a  failure  likely  to  occur.  Because 
this  information  may  not  be  available  or  accurate  in  some  cases,  the  need  to 
specify  p  is  a  drawback  of  the  Bayes  approach  for  such  cases.  Nevertheless,  we 
will  see  that  it  can  be  regarded  as  a  design  parameter  in  the  specification  of  the 
Bayes  rule. 

In  general,  p  may  be  arbitrary.  Here,  we  assume  the  underlying  failure 
process  has  two  properties:  1)  the  occurrence  of  each  of  the  M  failure  modes  is 
independent  of  the  other,  and  2)  the  occurrence  of  each  failure  i  is  a  Bernoulli 
process  with  (success)  parameter  px,  a  common  model  for  failure  s  in  physical 
components.  The  independent  assumption  is  also  a  reasonable  one  in  most 
applications.  It  is  straightforward  to  show  that 

p(i,r) -a(i)p(l-p)r_1  i-l,...,M,  r-1,2,... 

where 

p-i-na-Pj> 

j-i 

M  “t 

a(i)  -pj(l-pj)-1  ipjCl-pj)-1 
li-i 

The  parameter  p  may  be  regarded  as  the  parameter  of  the  combined  (Bernoulli) 
failure  process  which  specifies  the  statistics  of  the  occurrence  of  the  first  failure; 
a(i)  can  be  interpreted  as  the  marginal  probability  that  the  first  failure  is  of  type  i. 
Note  that  the  present  choice  of  p  indicates  that  the  arrival  of  the  first  failure  is 
memoryless. 
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3.  D( k)  :  The  discrete  set  of  terminal  decisions  available  to  the  decision  maker  when 
the  residual-monitoring  is  interrupted  at  time  k  in  order  to  make  failure 
identification.  An  element  8  of  i>(k)  may  denote  the  pair  Q,t),  i.e.  the 
declaration  of  a  type  i  failure  to  have  occurred  at  time  t<k.  Alternatively,  8  may 
represent  an  identification  of  the  j-th  failure  without  regard  for  the  failure  time,  or 
it  may  signify  the  presence  of  a  failure  without  specifying  its  type  or  time,  i.e. 
simply  an  alarm.  Note  that  the  number  of  terminal  decisions  specifying  failure 
times  grow  with  k  (as  there  are  more  times  at  which  a  failure  could  have 
occurred)  while  the  number  of  decisions  not  specifying  times  will  remain  the 
same.  In  addition,  D(k)  does  not  include  the  declaration  of  no-failure,  since  the 
residual  monitoring  is  stopped  only  when  a  failure  appears  to  have  occurred.  It  is 
worth  pointing  out  that  in  some  application  one  may  not  be  interested  in 
estimating  failure  onset  times,  there  are  others  in  which  one  is.  For  example,  if  a 
failed  sensor  has  been  used  for  sometime  in  a  closed-loop  filter  and  control  law, 
one  may  wish  to  estimate  how  long  the  failure  has  been  present  in  order  to 
compensate  for  the  effect  of  this  erroneous  signal.  In  addition,  onset  time 
estimates  are  critical  in  other  event  detection  problems  such  as  electrocardiogram 
analysis  [17]  and  maneuver  detection  [18,191. 

4.  L(k;0,8)  :  the  terminal  decision  cost  function  at  time  k.  L(k;0,8)  denotes  the 
penalty  for  deciding  8€Z)(k)  at  time  k  when  the  true  state  of  nature  is  9.  It  is 
assumed  to  be  bounded  and  non-negative  and  have  the  structure  : 


L(k;M) 


L«i,r),8), 

[LF, 


r<k,  8€Z>(k) 
r>k,  8€Z)(k) 
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L(k;(0,-),8)-Lp 

where  L((i,r),8)  is  the  underlying  cost  function  for  deciding  8  when  failure  (i,r) 
has  already  occurred.  Also,  Lp  denotes  the  penalty  for  a  false  alarm  (note  that  a 
false  alarm  corresponds  to  making  a  failure  declaration  before  one  occurs),  and  it 
can  be  generalized  by  allowing  it  to  be  a  function  of  8. 

The  cost  function  L((i,r),8)  generally  has  some  additional  structure.  For 
example,  a  terminal  decision  cost  that  indicates  the  correct  failure  (and/or  onset 
time)  should  receive  a  lower  cost  then  one  with  the  wrong  failure  (and/or  onset 
time)  indication.  We  further  assume  that  the  penalty  due  to  an  incorrect 
identification  of  the  failure  time  is  only  dependent  on  the  error  in  such  an 
identification.  That  is  for  8  -  (j,t), 

L(k;C,T),(j,t))-L(i,j,(t-r)) 

Note  that  L(i,i,(t-r))  corresponds  to  the  penalty  for  an  incorrect  time  estimate  of 
when  the  failure  type  is  correctly  determined.  Again  the  use  and  importance  of 
this  cost  depends  upon  the  application.  Finally,  if  onset  time  is  unimportant,  so 
that  8  does  not  obtain  a  time  specification,  we  have 

L((i,r),8)-L(i,8) 

5.  r(k)  :  the  m-dimensional  residual  (observation)  sequence.  We  shall  let 
p(r(l),...r(k)|i,T)  denote  their  joint  conditional  density  when  (i,r)  is  true.  Since 
the  residual  is  affected  by  the  failure  in  a  causal  manner,  its  conditional  density 
has  the  property 


p(r(l),...,r(k)|i,T)-p(r(l),...,r(k)|0,-),  1-1,...,M,  r>k 


In  this  paper,  we  will  assume  that  the  residual  is  an  independent  Gaussian 
sequence  with  V  (mxm  matrix)  as  the  time-independent  covariance  function  and 
gj(k-T)  as  the  mean  given  that  the  failure  G,r)  has  occurred.  With  the 
covariance  assumed  to  be  the  same  for  all  failures,  the  mean  function  gj(k-r) 
characterizes  the  effect  of  the  failure  (i,r),  and  it  is  henceforth  called  the 
signatures  of  (i,r)  (with  gj(k-r)  -  0  for  i-0,  or  r  St  k).  We  have  chosen  to  study 
this  type  of  residuals  because  its  special  structure  facilitates  the  development  of 
insights  into  the  the  design  of  decision  rules.  Such  a  model  arises  in  the  case  in 
which  the  residuals  are  generated  by  a  Kalman  filter  based  on  normal  operation 
and  in  which  the  failure  enter  additively  in  the  system  dynamics  or  sensor  outputs 
[201.  While  this  model  is  not  correct  if  parametric  failure  are  considered  (since  in 
this  case  the  correlation  structure  of  the  residuals  is  also  affected  by  the  failure), 
the  general  concepts  we  develop  for  the  formulation  of  a  BSDP  for  failure 
detection  carry  over  to  the  parametric  case.  Furthermore,  as  reported  in  [16,21], 
an  FDI  system  based  on  an  appropriate  additive-failure  model  can  often  work 
very  well  in  detecting  parametric  failures. 


6.  c(k,(i,r))  :  the  delay  cost  function  having  the  properties  : 


„  c(i,k-r)>0,  r<k 

c(k,(i,T))-j  ^  T*k 


c(i,k,— t)  >  c(i,k2-r),  k,>k2>r 


After  a  failure  has  occurred  at  time  t,  there  is  a  penalty  for  delaying  the  terminal 


a  <■  v 


-9- 

decision  until  time  t>r  with  the  penalty  an  increasing  function  of  the  delay 
(k-r).  In  the  absence  of  a  failure,  no  penalty  is  imposed  on  residual  sampling. 
In  this  paper  we  will  consider  a  delay  cost  function  that  is  linear  in  the  delay,  i.e. 
c(i,k-r)  - c(i) (k-r),  where  c(i)  is  a  positive  function  of  the  failure  type  i,  and 
may  be  used  to  provide  different  delay  penalty  for  different  types  of  failures. 

A  sequential  decision  rule  naturally  consists  of  two  parts  :  a  stopping  rule 
(sampling  plan)  and  a  terminal  decision  rule.  The  stopping  rule  is  essentially  a 
detection  rule  as  its  purpose  is  to  determine  whether  monitoring  should  be  interrupted 
in  order  to  identify  a  failure.  The  terminal  decision  rule  then  performs  the  subsequent 
identification.  The  stopping  rule  denoted  by 

♦  -  (^(0),^(l;r(l)),...,^(k;r(l),...,r(k)),...)  is  a  sequence  of  functions  of  the  observed 
residual  samples,  with  4(k;r(l),...,r(k))  -  1  or  0.  When  4(k;r(l),...,r(k))  - 1  (0), 
residual-monitoring  or  sampling  is  interrupted  (continued)  after  the  k-th  residual 
sample,  r(k),  is  observed.  Alternatively,  the  stopping  rule  may  be  defined  by  another 
sequence  of  functions  V  -  (^(0),^(l;r(l)),...,^(k;r(l),...,r(k)),...),  where 
4>(k,r(l),...r(k))  - 1  indicates  that  residual-monitoring  has  not  been  interrupted  up  to 
and  including  time  (k-1)  but  will  be  interrupted  when  residual  samples  r(l),...,r(k)  are 
observed  115].  The  functions  <J>  and  ¥  are  related  to  each  other  in  the  following  way  : 

d»(k;r(l),...r(k))  -$(k;r(l),...,r(k))  ll-$(r(l),...,r(s))l,  k^l 

$-0 

with  ^(0)  -$(0). 

The  terminal  decision  rule  is  a  sequence  of  functions, 
D-(d(0),d(l;r(l)),...,d(k;r(l),...,r(k)),...).  The  function  d(k;r(l),...r(k))  maps  the 
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residual  samples  r(l),...,r(k)  into  the  terminal  dedsion  set  0(k)  and  represents  the 
decision  rule  used  to  arrive  at  a  failure  identification  if  sampling  is  interrupted  at  time 
k. 

If  G,t)  is  the  true  state  of  nature  and  if  the  sequential  decision  rule  (<b,D)  is  used, 
then  the  total  expected  cost,  i.e.  the  expectation  of  the  sum  of  the  delay  and  terminal 
dedsion  costs  is 

U[(i,r),(*,D)l-  jEi,t|*(k;r(l)>...,r(k))[c(kl(i,r))+L(k;(i,T),d(k;r(l) . r(k)))]} 

k-0 

where  Ej>T  denotes  the  expectation  given  that  (i,r)  is  true.  The  Bayes  Sequential 
Decision  Rule  (BSDR)  with  respect  to  p  is  defined  to  be  the  sequential  decision  rule 
(4>*,D*)  that  minimizes  the  sequential  Bayes  risk  US(<P,D)  which  is  given  by 

U#(*,D)-E{UIG,t),(*,D)1} 

-  2  2#*(i,r)Ul(i,T),(*,D)l 

i— Ir— 1 

Now  we  discuss  an  interpretation  of  the  sequential  Bayes  risk  for  the  FDI  problem. 
Let  us  define  the  following  notation 

Pp(r)-f2  Eo_l*(k;r(l) . r(k))} 

k-1 

0-  U  0(k) 
k-0 


S(k,8)  -  (Ir(l),...,r(k)]:  *(k;r(l),...,r(k))-l,  d(k;r(l),...,r(k))-8},  8€0 


Pr{S(M)M-  /  p(r(l),...,r(k)|i,T)dr(l) . dr(k) 

§<M> 


t^T)  -  2  (k-THl-PpW)-1^  {#(k;r(l),...,r(k))} 

k-r 


P(a,T),»)-  2  Pr  (S(M)  |  i,r) 

k-f 

where  Pp(r)  is  the  probability  of  stopping  to  declare  a  failure  before  the  failure  occurs 
at  r,  i.e.  the  probability  of  false  alarm  when  a  failure  occurs  at  time  r  or  later.  D  is 
the  set  of  terminal  decisions  for  all  times.  S(k,8)  is  the  region  in  the  sample  space  of 
the  first  k  residuals  where  the  sequential  rule  (<b,D)  yields  the  terminal  decision  8. 
Clearly,  the  S(k,8)'s  are  disjoint  sets  with  respect  to  both  k  and  8.  The  expressions 
t(i,r)  and  P((i,r),8)  are  respectively  the  conditional  expected  delay  and  the 
conditional  probability  of  declaring  8,  given  a  type  i  failure  has  occurred  at  time  t  and 
no  false  alarm  has  been  signalled  before  this  time.  P((i,r),8)  is  called  the  generalized 
cross-detection  probability.  Using  these  quantities  the  sequential  Bayes  risk  can  be 
written  as 


M  « 


U#(<b,D)  -  2  2^^T){LpPp(T)+(l“Pp(T))[c(i)t(i,T)+  2L((i,T),8)P((i,r),8)ll 

i-l  f-I  (1) 


Equation  (1)  indicates  that  the  sequential  Bayes  risk  is  a  weighted  combination  of 
the  conditional  false  alarm  probability,  expected  delay  to  decision  and  cross-detection 
probabilities,  and  the  optimal  sequential  rule  ($*,D*)  minimizes  such  a  combination. 
From  this  vantage  point,  the  cost  functions  (L  and  c)  and  the  prior  distribution  (/*) 
act  as  the  weighting  coefficients  and  hence  serve  as  a  basis  for  specifying  the  tradeoff 
relationships  among  the  various  performance  issues.  The  advantage  of  this  approach 
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is  that  only  the  total  expected  cost  instead  of  every  individual  performance  issue  needs 
to  be  considered  explicitly  in  designing  a  sequential  rule.  The  drawback,  however,  lies 
in  the  need  to  choose  a  set  of  appropriate  cost  functions  (and  the  prior  distribution) 
when  the  physical  problem  does  not  have  a  natural  set,  as  it  doesn’t  in  general.  In 
this  case,  the  Bayes  approach  is  most  useful  with  the  cost  functions  and  the  prior 
distribution  considered  as  design  parameters  that  may  be  adjusted  to  obtain  an 
acceptable  design. 

The  optimal  terminal  decision  rule  D*  can  be  easily  shown  to  be  a  sequence  of 
fixed-sample-size  tests  [IS].  The  determination  of  the  optimal  stopping  rule  <b*  is  a 
dynamic  programming  problem  [22].  The  immense  storage  and  computation  required 
make  <&*  impossible  to  compute,  and  suboptimal  rules  must  be  used. 

Despite  the  impractical  nature  of  its  solution,  the  BSDP  provides  a  useful 
framework  for  designing  suboptimal  decision  rules  for  the  FDI  problem  because  of  its 
inherent  characteristic  of  explicitly  weighing  the  tradeoffs  between  detection  speed  and 
accuracy  (in  terms  of  its  cost  structure).  A  sequential  decision  rule  specifies  a  set  of 
sequential  decision  regions  S(k,S),  and  the  decision  regions  corresponding  to  the 
BSDR  yields  the  minimum  risk.  From  this  vantage  point,  the  design  of  a  suboptimal 
rule  can  be  viewed  as  the  problem  of  choosing  a  set  of  decision  regions  that  would 
yield  a  reasonably  small  risk.  This  is  the  essence  of  the  approach  to  suboptimal  rule 
design  that  we  take  in  this  paper  and  describe  next. 
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X.  DESIGN  OF  SXJBOPT1MAL  BULE8 
U  Subopiiaal  Rule*  Bated  oa  the  BSDB  - 

The  Sliding  Window  Approximation 

The  immense  computation  associated  with  the  BSDR  is  partly  due  to  the  increasing 
number  of  possible  failure  times  that  must  be  considered  as  time  progresses.  The 
remedy  for  this  problem  is  the  use  of  a  sliding  window  to  limit  the  number  of  failure 
onset  times  to  be  considered  at  each  time.  The  assumption  made  under  the  sliding 
window  approximation  is  that  essentially  all  failures  can  be  detected  within  W  time 
steps  after  they  have  occurred,  or  that  if  a  failure  is  not  detected  within  this  time  it 
will  not  be  detected  in  the  future.  Here,  the  window  size  W  is  a  design  parameter, 
and  it  should  be  be  chosen  long  enough  so  that  detection  and  identification  of  iaifores 
are  possible,  but  short  enough  so  that  implementation  is  feasible  [22]. 

The  sliding  window  rule  (<PW,DW)  divides  the  sample  space  of  the  sliding  window 
of  residuals  r(k-W+l),...,r(k),  or  equivalently,  the  space  of  vectors  of  posterior 
probabilities,  likelihood  ratios,  or  log  likelihood  ratios  of  the  sliding  window  of  fiuBure 
hypotheses  into  disjoint  time-independent  sequential  decision  regions  S0,Si,..^Sn.  Here, 
N~M  if  no  failure  time  indication  is  involved  in  the  terminal  decision,  while  N*MW 
if  a  failure  time  estimate  is  also  required.  Because  the  residuals  are  assumed  So  be 
Gaussian  variables  with  variances  that  do  not  depend  on  the  hypothesis,  it  is  ea wg  to 
check  that  an  equivalent  set  of  sufficient  statistics  is  given  by  [20,23] 

A(k)  “  [A'Q(k),...,A'w_j  (k)]' 


where  for  O^cr^W-1 
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A,(k)  -  [A (k;l, <7),..., A (k;M, <r)]' 

A(k;i,<r)  -  ig'jMV^rOc-^+s) 

Here  <r  indexes  the  possible  failure  onset  times  measured  relative  to  the  present  time 
k  (i.e.  <r  corresponds  to  a  failure  onset  at  time  k-<r).  The  quantities  A(k;i,cr)  differ 
only  by  an  unimportant  constant  from  the  log-likelihood  ratios  for  each  hypothesis 
versus  the  no-fail  hypothesis.  The  sliding  window  decision  procedure  operates  as 
follows.  At  each  time  k>W,  we  form  the  decision  statistics  A(k)  from  the  window  of 
residual  samples.  If  A(k)€Sj,  for  i*l,...,N,  we  stop  sampling  to  declare  8j;  otherwise, 
A(k)€S0  and  we  proceed  without  making  any  immediate  decision.  The  Bayes  design 
problem  is  to  determine  a  set  of  regions  Sj,Sj*,...,S^  that  minimizes  the  corresponding 
sequential  risk  Usw({Sj)  (the  expression  for  which  we  will  describe  shortly).  This 
represents  a  functional  minimization  problem  that  is  generally  very  difficult  to  solve. 
A  simplification  of  this  problem  is  to  constrain  the  decision  regions  to  take  on  special 
shapes,  Sj(f),  that  are  parameterized  by  a  fixed  dimensional  vector  f  of  design 
variables.  A  typical  choice  for  these  parametrically-specified  regions  might  be  in 
terms  of  the  relative  ordering  of  the  sizes  of  the  L(k;i,r)  and  a  set  of  threshold  levels 
which  correspond  to  the  components  of  f  (see  (3)  below).  While  such  a  constrained 
structure  will  lead  to  a  suboptima]  solution,  the  difference  between  the  performance 
resulting  from  using  the  best  constrained  solution  and  that  achieved  by  the  optimal 
will  be  small  if  the  constrained  structure  is  chosen  carefully.  Furthermore,  it  is  our 
contention  that  this  performance  difference  will  typically  be  mostly  an  artifact  of  the 
idealized  problem  formulation  rather  than  a  reality.  That  is,  the  unconstrained 
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problem  seeks  io  find  the  best  boundaries  between  decision  regions,  while  the 
constrained  problem  fixes  the  boundary  shapes  (e.g.  straight  lines  of  polygonal 
boundaries).  Given  that  the  residual  statistical  model  used  to  define  the  problem  is 
subjected  to  error,  the  extra  drop  of  performance  resulting  from  being  able  to  "fine 
tune”  the  boundary  shapes  will  generally  be  dwarfed  in  the  uncertainty  arising  from 
modeling  errors. 

In  the  remainder  of  this  paper  we  focus  our  attention  on  a  special  set  of 
parameterized  sequential  decision  regions,  because  they  are  simple  and  they  serve  well 
to  illustrate  that  the  Bayes  formulation  can  be  exploited,  in  a  systematic  fashion,  to 
obtain  simple  suboptimal  rules  that  are  capable  of  delivering  good  performance. 
These  decision  regions  are: 

S(j,t)-lA(k):  A(kJ,t)  >  f(j,t), 

<“■  (j,t) IA (ky,t)-f(j,t)  J  >  «-|(i,s){A(k;i,s)-f(i,s)I,  (i,s)*(j,t)  I 

(3a) 

S(0,-)  -  ,  A(k):  A(k;i,s)  <  f(i,s),  i-l,...,M,  s-0 . W-l  } 

(3b) 

where  S(j,t)  is  the  stop-to-declare-(j,k-t)  region  and  S(0,-)  is  the  continue  region.  See 
Figure  1  for  a  pictorial  representation  of  the  structure  of  (3)  in  the  case  where  there 
are  only  two  failure  hypothesized  failure  (j,k-t)  and  (i,k-s).  Generally,  the  c’s  may 
be  regarded  as  design  parameters,  but  here,  «(j,t)  is  simply  taken  to  be  the  standard 
deviation  of  A(kJ,t). 

To  evaluate  U*(0,  the  Bayes  risk  due  to  the  use  of  (3),  we  need  to  determine  the 
set  of  probabilities,  Pr(A(k)€S(j,t),  A(k-l)€S(0,-),...,  A(W)€S(0,- )|i,r),  k>W, 
j“l,...,M,  t»0,...,W-l,  which,  indeed,  is  the  goal  of  many  research  efforts  in  so-caDed 


level-crossing  problems  [24].  As  it  stands,  each  of  the  probabilities  is  an  integral  of  a 
kMW-dimensional  Gaussian  density  over  the  compound  region 
S(0,-)x  •  •  •  xS(0,-)xS(j,t),  which,  for  large  kMW,  becomes  extremely  unwieldy 
and  difficult  to  evaluate.  A  variety  of  approximations  and  bounds  [25-28]  have  been 
developed  for  the  evaluation  of  quantities  such  as  this.  We  have  not  investigated  the 
utility  of  any  of  these  for  our  problem  but  rather  have  developed  a  systematic 
approach  which  is  particularly  appropriate  for  the  dynamic  FDI  problem  and  which 
greatly  simplifies  the  required  calculations. 

As  a  first  step  in  this  process,  we  reduce  the  dimension  of  the  decision  statistic 
A(k)  from  MW  to  M.  Specifically,  we  will  base  our  decision  process  solely  on  the 
values  of  the  log-likelihood  ratios  for  each  of  the  M  failures  modes  assuming  an  onset 
time  precisely  at  <r- W-l,  i.e.  the  beginning  of  the  window.  Since  we  are  not 
estimating  failure  time  in  this  case,  the  terminal  decision  to  be  made  is  simply  the 
identification  of  the  failure  modes.  The  rational  behind  this  simplification  has  several 
aspects.  First,  in  many  applications,  such  as  the  aircraft  sensor  FDI  problem  [16]  and 
the  detection  of  freeway  incidents  [21],  where  the  failure  time  need  not  be  explicitly 
identifies,  the  failure  time  resolution  power  provided  by  the  full  window  of  decision 
statistics  is  not  needed.  Furthermore,  even  if  failure  onset  time  information  is 
desired,  resolution  of  this  time  within  a  block  of  length  W  may  often  be  sufficient.  If 
not,  one  can  imagine  a  two-level  decision-making  structure  in  which  one  first 
determines  the  failure  type  (using  the  procedure  to  be  described)  and  then  estimate 
the  onset  time.  Note  that  this  overall  system  will  have  decidedly  lower  complicity 
than  one  based  on  simultaneous  detection,  identification  and  onset  time  estimation. 
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In  Assuming  the  utility  of  the  approach  just  described  one  must  make  sure  that  the 
resulting  decision  algorithm  does  not  have  a  significantly  elevated  probability  of 
incorrectly  identifying  the  failure  type.  That  is,  if  a  failure  of  type  i  occurs  at  a  time 
before  the  end  of  the  window  and  if  a  detection  occurs,  one  would  want  the 
subsequent  identification  to  also  be  i  with  high  probability.  Determining  whether  this 
is  the  case  can  be  done  completely  in  terms  of  the  failure  signatures  [291.  We  can 
expect  good  performance  if  cross-correlation  among  signatures  for  failures  of  the  same 
type  at  different  times  are  significantly  higher  than  the  cross-correlations  of  signatures 
corresponding  to  different  failure  types.  We  note  that  this  is  often  the  case  in  practice, 
and  in  fact  an  often-used  goal  for  the  residual  generation  process  is  that  of  producing 
signatures  which  are  orthogonal  or  which  at  least  lie  in  trivially  overlapping  subspaces 
11,221. 

A  decision  rule  of  the  type  just  described  consists  of  sequential  decision  regions 
that  are  similar  to  (3)  but  are  only  defined  in  terms  of  the  M  components 
A(k;i,W-l),  i-l,...,M: 

Aw_,(k)  -  iA (k;l,W-l),  A(k;2,W— 1),..., A(k;M,W-l)]' 

(4a) 

Sj  -  (  Aw_,(k):  A(ku,W-l)  >  fjt 


«_,0,W-l)lA(ku,W-l)-fj]  >  €-1(i,W-l)[A (k;i,W— 1)— fjl,  j*i } 

(4b) 

Sq  ”  (  Aw_j(k):  A(ky,W-l)  ^  f,  j"l,...M  } 

(4c) 

where  Sj  is  the  stop-to-declare-j  region  and  S0  is  the  continue  region. 


The  risk  for  using  (4)  is 


U,w(0  -  Lp|  2  n(i,r)'x  IPr  (Aw_,  (k)  t  S,,8,  (k-l)|  0,-J 
1-1  T-W+l  k-w  J-l 

+  2  2*<U*)  2  2  1*0)  (k-r) +L  (i  j)]  Pr  |AW_,  (k)  €  Si,  S'  (k- 1 )  |  1,t) 

i-l*-l  k-Mx|w,r]  j-l 

where  80  is  the  event  defined  below: 

B0(k)  -  (  Aw_,(k)€So,  ....  Aw.,(W)€Sf} 

The  first  term  in  the  expression  for  U,w  (0  represents  the  portion  of  the  risk  due  to 
false  alarms.  The  key  expression  here  is  Pr{Aw_,(k)€Sjt8o(k-l)|  0,-},  which  is  the 
probability  that  no  detections  have  been  made  before  time  k  but  that  an  identification 
for  a  type  j  failure  is  made  a  time  k,  given  that  no  failure  has  occurred.  The 
remaining  portion  of  U,w  (0  represents  that  part  of  the  risk  corresponding  to  detection 
delay  and  the  possibility  of  incorrect  identification.  Here  the  key  quantity  is  the 
probability  Pr  ( A w_ ,  (k)  €  Sj ,  8fl  (k- 1 )  |  i  ,<r) ,  which  is  the  probability  that  a  detection  is 
first  made  at  time  k  and  that  the  failure  is  identified  as  being  of  type  j  given  that  a 
type  i  failure  occurred  at  time  r<k.  The  calculation  of  these  probabilities  is  specified 
by  the  following  recursions: 

Pr{AWM(k+l)€Sj|S0(k),i,T} 

-  [/p(Aw.1(k)|S0(k-l),i,T)dAw_1(k)  I"1  x 
S« 

J*p (A w_j  (k+ 1 )  |  A w_ |  (k) ,  S0 (k- 1 )  ,i,r)  p (Aw_ j  (k)  |  S0  (k- 1 )  ,i,r )  dA  w_|  (k) , 
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Pr{Aw_,  (k)  €  Sj,  Sq  (k- 1)  J  i,r} 

-  Pr(S0(k-l)|0,-} Jp(Aw_,(k)|S0(k-l),i,r)dAw_,(k),  j-1 . M 


with 

Pr  (A w_ ,  (W)  €  Sj  |  i,r)  -  /p(Aw_,(W)|i,r)dAw_,(W) 


Note  that  in  essence  what  we  are  calculating  in  (5) -(7)  are  several  different  level 
crossing  probabilities,  and  as  we  have  just  shown,  it  is  these  calculations  that  are  the 
central  elements  to  be  determined  in  evaluating  the  performance  of  an  hypothesized 
detection  rule.  For  M  small,  numerical  integration  of  (5) -(7)  becomes  manageable 
assuming  that  the  required  integrands  are  available. 

Unfortunately,  the  transition  density,  p(Aw_,(k+l)|  Aw_,(k),80  (k-l),i,T), 
required  in  (5)  is  difficult  to  calculate,  because  Aw_i(k)  is  not  a  Markov  process.  In 
order  to  facilitate  the  computation  of  these  probabilities,  we  use  an  approximation  for 
this  transition  density  obtained  by  developing  an  approximate  Markovian  model  for 
the  evolution  of  Aw_|(k).  A  simple,  but  quite  useful  approximation  is  an  an  M- 
dimensional  Gauss-Markov  process  Kk)  that  is  defined  by 


Kk+1)  -  A/(k)  +  {(k+l) 


(Sa) 


cov  U(k)t'(t)}  -  rsk  l 


(8b) 


where  A  is  an  MxM  constant  matrix,  and  t  is  a  white  Gaussian  sequence  (with 
covariance  equal  to  the  (MxM)  matrix  D  uncorrelated  with  Kk).  The  conditional 
mean  of  ((k)  will  be  specified  shortly.  The  reason  for  choosing  this  model  is  twofold. 


First,  just  as  Aw_|(k),  Kk)  is  Gaussian.  Second,  Kk)  is  Markov  so  that  its  transition 
density  can  be  readily  determined.  In  order  to  have  the  evolution  of  Kk)  match  that 
of  Aw_i  (k)  as  closely  as  possible,  we  choose  the  matrices  A  and  T  and  the  conditional 
mean  E^,{£ (k)}  of  £(k)  under  the  hypothesis  G,t)  so  that 


B^Wk))  -  Ei,,lAw_,(k)} 
Eo-{Kk)/'(k)}  -  Eo_ [Aw_,(k)A'w_i (k)} 
Eo«(/(k)/'(k+l)}  -  (Aw_|  (k) A'w_,  (k+1)} 


(9a) 

(9b) 


(9c) 

That  is,  we  have  matched  the  marginal  density  and  the  one-step  cross-covariance  of 
/( k)  to  those  of  Aw_t(k).  A  straightforward  calculation  shows  that  (8)-(10)  uniquely 
spedfy 


A  -  2'j  V1 
r-Zo-r^o"1!, 

Ei,r{{(k+l)l  -  Ei,f(Aw_,(k+l)}  -  AEjiT {Aw_,(k)l 


(10a) 

(10b) 


Xo  -  Eg  _  { A w_ |  (k)  A'w-i  (k)}  -  2*0,  V^G', 

t-o 

2,  -  E0_(Aw_1(k)A*W-|(k+l)) -™i2 Gt+xV-'G\ 


where 


(10c) 
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Et,*(Aw-i(k)} 


0,  r  >  k 

*2Gl_*V-||i<t),  ko-k-W+l-r  <  0 

t-o 


^GtV-Vt+ko),  k«  -  k-W+l-r  >  0 
t-o 


G|-  Ifi(t) . 8m  (0  r 

Clearly,  2^*  exists  if  the  failure  signatures  lg,i(0),...,g'j(W-l)],  i»l,...,M,  are 
linearly  independent.  This  condition  is  equivalent  to  the  statement  that  there  is 
sufficient  information  in  a  window  of  length  W  to  distinguish  among  all  of  the  M 
possible  failure  modes,  assuming  that  if  one  of  these  failures  has  occurred,  it  did  so  in 
the  beginning  of  the  window.  A  sufficient  condition  for  A  to  be  stable,  i.e.  the 
magnitude  of  all  its  eigenvalues  are  less  than  unity,  and  r  be  invertible  is  that  either 
G0  or  Gw_|  is  of  rank  M.  (See  the  appendix  for  a  discussion  of  the  necessary  and 
sufficient  conditions  for  the  invertibility  of  T  and  the  stability  of  A.) 

As  an  alternative  to  the  model  specification  just  given  it  is  possible  to  choose  other 
Markov  approximations  for  Aw_](k).  For  example,  one  could  match  the  n-step 
cross-covariance  (l<n<W)  instead  of  matching  the  one-step  cross-covariance  as  in 
(10).  The  suitability  of  a  criterion  for  choosing  the  matrices  A  and  T,  such  as  (9)  and 
(10),  depends  directly  on  the  failure  signatures  under  consideration  and  may  be 
examined  as  an  issue  separate  from  the  decision  rule  design  problem.  Also,  a  higher 
order  Markov  process  may  be  used  to  approximate  Aw_j(k).  However,  the  increase 
in  the  computational  complexity  may  negate  the  benefits  of  the  improved 
approximation.  Finally,  we  emphasize  that  the  statistics  /(k),  as  we  have  described  it 


here,  is  not  an  observable  quantity.  That  is,  it  cannot  be  computed  from  the  residuals. 
Rather,  /(k)  is  an  artificial  process  introduced  in  order  to  obtain  approximations  for 
the  calculation  of  the  statistics  of  Aw_j(k).  Later  in  this  section  we  will  describe  a 
suboptimal  test  statistics  to  replace  (k)  which  is  computable  from  the  residuals 
and  which  is  also  Markov. 

Using  the  model  we  have  developed  for  /(k)  we  can  approximate  the  required 
probabilities  by  substituting  Kk)  for  A^_j  (k)  in  the  calculations.  That  is, 

Pr(Aw_, (k) € Sj, S0 (k- 1) |  i,r}=Pr {/( k) € Sjt  S0(k- 1)  |  i,r} ,  j— 0,1 M ,  k> W 

and 

Pr{/(k)€Sj,S0(k-l)|i,T}  -  Pr(S0(k-l)| i,r) Jp(/(k)|S0(k-l)i,T)d/(k) 

*  (11) 

Assuming  T”1  exists,  we  have 
p(/(k+l)|S0(k),i,T) 

-  l/p(/Ck)|S0(k“l),i,T)d/(k)  ]“*  x 

s. 

/[p({(k+l)-(/(k+l)-A/(k))|i,T)p(/(k)|S0(k-l),i,T)]d/(k) 

*•  (12) 

where  p(£(k)|i,r)  is  the  Gaussian  density  of  ((k)  under  the  failure  (i,r).  The  key 
simplification  that  results  from  using  the  Markovian  approximation  is 

p(/(k+l)|/(k),S0(k-l),i,r)  -  p(/(k+l))|/(k),i,T) 

-  p({(k+l)-/(k+l)-A/(k)|i,T) 


Because  of  this,  the  integrands  in  (12)  are  readily  obtained  (the  first  comes  from  the 
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previous  step  of  the  recursion)  and  thus  the  integrals  in  (12)  can  be  calculated  more 
easily. 

In  the  event  that  r  is  not  invertible,  the  density  for  £(k)  is  degenerate  and  (12)  is 
more  difficult  to  evaluate.  As  discussed  in  the  appendix,  the  invertibility  of  r  is 
related  to  the  distinguishability  of  the  M  failure  modes.  Consequently,  in  any  well- 
posed  failure  detection  problem,  W  will  be  chosen  so  that  the  invertibility  of  T  is 
assured. 


Non-Window  Sequential  Decision  Rules 


Here  we  describe  another  simple  decision  rule  that  has  the  same  decision  region  as 
the  simplified  sliding  window  rule  (4),  but  the  vector,  z,  of  M-dimensional  statistics  is 
obtained  differently  as  follows: 


z(k+l)  -  Az(k)  +Br(k+1) 


(13) 


where  A  is  a  constant  stable  M  x  M  matrix,  and  BisaMxm  constant  matrix  of  rank 
M.  Unlike  the  Markov  model  /(k)  that  approximates  Aw_j(k),  z(k)  is  a  realizable 
Markov  process  driven  by  the  residuals.  The  advantages  of  using  z(k)  as  the  decision 
statistics  are:  1)  less  storage  is  required,  because  residua)  samples  need  not  be  stored 
as  necessary  in  the  sliding  window  scheme,  and  2)  since  z(k)  is  Markov,  the  required 
probability  integrals  are  of  the  forms  (11)  and  (12)  so  that  the  same  integration 
algorithm  can  be  directly  applied.  Of  course,  z(k)  is  a  suboptimal  decision  statistic. 
One  could,  if  desired,  use  a  higher-order  model  for  z(k)  so  that  it  more  nearly  equals 
Aw_|(k),  but  the  added  computational  complexity  may  negate  the  advantages. 


In  order  to  form  the  statistics  z(k),  we  need  to  choose  the  matrices  A  and  B. 
When  the  failure  signatures  nnder  consideration  are  constant  biases  and  M<m,  B  can 
simply  be  set  to  equal  O0V~®  (provided  Q0  is  of  rank  M),  and  A  can  be  chosen  to  be 
ol,  where  0<a<  1.  Then,  the  term  Br  in  (13)  provides  the  correlation  of  the 
residuals  with  the  signatures  as  in  (2),  while  the  time  constant  «“*  characterizes  the 
memory  span  of  z(k)  just  as  W  characterizes  that  of  the  sliding  window  statistics. 

More  generally,  if  we  consider  the  case  where  failure  signatures  are  not  constant 
biases,  rank(G0)<M,  or  m<M,  the  choice  of  A  may  still  be  handled  in  the  same  way 
as  in  the  previous  case,  but  the  selection  of  B  is  more  involved.  With  some  insights 
into  the  nature  of  the  signatures,  a  reasonable  choice  of  B  can  often  be  made  in  order 
to  have  distinct  components  of  z(k)  respond  primarily  to  the  corresponding  failure. 
To  illustrate  how  this  may  be  accomplished,  we  will  consider  an  example  with  two 
failure  modes  (M—2)  and  an  m-dimensional  residual  vector.  Let 

g,(k— r)  -  fi\ 
g2(k-r)  -  /32(k-T+l) 

That  is,  gj  is  a  constant  bias,  and  g2  is  a  ramp.  If  and  ft2  are  orthogonal  a  simple 
choice  of  B  is  available: 


This  choice  may  often  be  acceptable  even  when  ft\ ft2^0.  It  is  clearly  not  of  any  use 
when  /9|  and  fi2  are  multiples  of  the  same  vector  ft,  or  when  they  are  scalars 
(corresponding  to  m-1),  as  the  rank  of  B  is  less  than  2.  In  these  cases  we  can 
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consider  processing  groups  of  residuals.  For  example,  suppose  we  batch  process  every 
two  residual  samples  together,  i.e.  we  use  the  augmented  residual  sequence 
r(k)  -  Ir,(2k-l),r*(2k)]\  k— 1,2,....  In  this  case  we  can  set  B  to  be 


Thus,  this  B  is  of  dimension  M  x  2m  and  has  rank  M  (—2).  The  first  and  second  rows 
of  B  captures  the  constant  bias  and  ramp  nature  gt  and  g2  ,  respectively.  The  use  of 
the  modified  residual  r(k)  in  this  case  causes  no  adverse  effect,  since  it  only  lengthens 
slightly  the  interval  between  times  when  terminal  decisions  can  be  made.  Clearly  one 
can  consider  further  augmentation  and  batch  processing  of  the  residuals,  and  in 
general  the  logical  choice  of  B  is  one  in  which  each  row  of  B  contain  in  sequence  the 
initial  values  of  the  corresponding  failure  signature.  In  this  case  the  mean  values  of 
z(k)  will  exactly  equal  that  of  Aw_|(k)  for  a  number  of  time  steps  following  a  failure 
equal  to  the  level  of  augmentation  used.  The  utility  of  this  approach  clearly  depends 
on  the  temporal  structure  of  the  failure  signatures.  For  problems  where  the  signatures 
vary  drastically  as  a  function  of  the  elapsed  time  and  the  distinguishability  among 
failures  depends  essentially  on  these  variations,  the  effectively  of  using  z(k) 
diminishes.  In  such  cases  the  sliding  window  decision  rule  should  provide  better 
performance,  although  it  should  be  noted  that  in  this  case  one  would  typically  have  to 
use  a  comparatively  long  window  in  order  to  obtain  an  adequate  degree  of 
distinguishability. 


SJ  Risk  Evaluation 


An  algorithm  based  on  1 -dimensional  Gaussian  quadrature  formulas  [30,31]  has 
been  developed  to  compute  the  probability  integrals  of  (11)  and  (12)  for  the  case 
M~2.  (It  can  be  extended  to  higher  dimension  with  an  increase  in  computation.) 
The  details  of  this  quadrature  algorithm  is  described  in  [22].  Its  accuracy  has  been 
assessed  via  comparison  with  Monte  Carlo  simulations  (see  the  numerical  example  in 
Section  4).  With  this  algorithm  we  can  evaluate  the  performance  probabilities  and 
risks  associated  with  the  suboptimal  decision  rules  described  above. 


In  the  absence  of  a  failure,  the  conditional  density  for  /( k)  (12)  has  been  observed 
in  numerous  examples  to  essentially  reach  a  steady-state  at  some  finite  time  T>  W\ 
Assuming  this  is  the  case,  we  have  for  k^r^T, 


Pr{/(k)  €  Sj|  S0  (k— 1)  ,0,-}  -  bj 


(14) 


Pr(/(k)€Sj, /(k— 1) €S(j,...,/(t)€S0| S0(r— l),i,r)  -  bj(k-r|i) 


(15) 


That  is,  once  steady-state  is  reached,  only  the  elapsed  time  since  failure  is 
important.  Generally,  failures  occur  infrequently,  and  decision  rules  with  low  false 
alarm  probability  are  employed.  Thus,  it  is  reasonable  to  assume  1)  p«l,  i.e. 
(1— p)"”,s==  1,  and  2)  Pr{80(T)  1 0,-}~  1.  The  sequential  risk  associated  with  (4)  for 
M  — 2  can  be  approximated  by 


U,w(f)  -PFLF+(l-PF)2a(i)2  |[c(i)  +  L(ij)]bJ(t|i) 

i-l  j-1  t-0 


(16) 


where 


*  Unfortunately,  we  have  not  been  able  to  prove  such  convergence  behavior  using  elementary 
techniques.  More  advance  (Unction-theoretic  methods  may  be  necessary. 


Pp  “ 


(l-p)(l-b0) 

l-bo(l-p) 


Pp  is  the  unconditional  false  alarm  probability,  Le.  the  probability  of  one  false  alarm 
over  aU  time. 

Next,  we  seek  to  replace  the  infinite  sum  over  t  in  (16)  by  the  finite  sum  up  to 
t— Q  plus  a  term  approximating  the  remainder  of  the  infinite  sum.  Suppose  we  have 
been  sampling  for  Q  steps  since  a  failure  occurred.  Define 

P,(j|0  -  Pr{/(t)€Sj|S0(t-l),i,0),  j-0,1,2 
If  we  stop  computing  the  probabilities  after  Q,  we  may  approximate 

PtdlD^PQGli),  j-0,1,2,  t>Q 

(17) 

That  is  we  assume  that  after  a  detection  delay  of  Q  steps  the  conditional  probability  of 
detection  at  any  time  given  no  detection  at  any  previous  time  reaches  a  constant 
steady-state  value.  This  is  the  same  as  assuming  that  beyond  Q  steps  of  delay,  the 
additional  detection  delay  is  exponentially  distributed.  This  assumption  is  reasonable 
for  constant  failure  signatures  or  signatures  that  reach  steady-state.  While  the 
assumption  may  not  be  valid  for  signatures  which  continue  to  vary,  the  effect  of  this 
approximation  is  generally  quite  small,  since  for  all  i  and  j  one  typically  can  choose  Q 


so  that 


bj(t| i)  *  0,  t>Q 


That  is,  for  each  failure  mode  the  probability  of  a  detection  delay  greater  than  Q  steps 
is  negligible. 


Substituting  (17)  in  (16),  we  obtain 


where 


U,w®  -  PpLp  +  (l—Pp)  2*(*)|c(i)tj  +  iLOj)P(ij)] 

l-l  J-l 
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Here,  tj  is  the  conditional  expected  delay  to  decision,  given  that  a  type  i  failure  has 
occurred,  and  P(j|i)  is  the  conditional  probability  of  declaring  a  type  j  failure,  given 
that  failure  i  has  occurred.  From  the  assumption  that  Pr{S0  (T)|0,-}=1  and  the 
steady-state  condition  (14),  it  can  be  shown  that  the  mean  time  between  false  alarms 
is  simply  (l-b0)-1.  Now  all  the  probabilities  in  (18)-(20)  can  be  computed  by  using 
our  quadrature  algorithm.  Note  that  the  risk  expression  (18)  consists  only  of  finite 
sums  and  it  can  be  evaluated  with  a  reasonable  amount  of  computational  effort.  With 
such  an  approximation  of  the  sequential  risk,  we  are  able  to  consider  the  problem  of 
determining  the  decision  regions  (i.e.  the  thresholds  fj’s)  that  minimizes  the  risk. 

It  should  be  noted  that  we  could  consider  choosing  a  set  of  thresholds  that 
minimizes  a  weighted  combination  of  certain  detection  probabilities  (P(ij)),  the 
expected  delay  (tj),  and  the  mean  time  between  false  alarms  (l-b0)~1.  Although 
such  an  objective  function  will  not  result  in  a  Bayesian  design  in  general,  it  is  a  valid 
design  criterion  that  may  be  useful  for  some  applications. 


SJ  Risk  M  hi  imitation 


The  risk  minimization  has  two  features  that  deserves  special  attention.  First,  the 
sequential  risk  is  not  a  simple  function  of  the  threshold  f,  and  its  derivatives  with 
respect  to  f  is  not  readily  available.  Second,  calculating  the  risk  is  a  computationally 
intensive  task.  Therefore,  the  minimum-seeking  procedure  to  be  used  must  require 
few  function  evaluations,  and  it  must  not  require  derivatives.  For  these  reasons  we 
chose  to  use  the  Sequence-of-Quadratic-Programs  (SQP)  algorithm  studied  by 
Winfield  [32]  to  solve  this  problem,  because  it  does  not  need  any  derivative 
information  and  it  appears  to  require  fewer  function  evaluations  than  other  well- 
known  algorithms  [32].  Furthermore,  the  SQP  is  simple,  and  it  has  quadratic 
convergence.  Very  briefly,  the  algorithm  consists  of  the  following.  At  each  step  of 
the  iteration,  a  quadratic  surface  is  fitted  to  the  risk  function  locally  using  the 
preceding  guesses  at  the  optimal  value  of  f  and  the  corresponding  risk  function 
evaluations.  The  resulting  quadratic  model  is  minimized  over  a  constrained  region 
(hence  the  name  SQP).  The  risk  function  is  evaluated  at  this  minimum  and  is  used  in 
the  surface  fitting  of  the  next  iteration.  The  details  of  the  application  of  SQP  to  risk 
minimization  is  reported  in  [22]. 


4.  NUMERICAL  EXAMPLE 


Now  we  discuss  an  application  of  the  suboptimal  rule  design  methodology 
developed  in  this  paper.  We  consider  the  detection  of  two  possible  failure  modes 
(without  identifying  the  failure  time).  The  residual  is  a  2 -dimensional  vector,  and  the 
vector  failure  signatures,  g|(t),  i— 1,2,  as  functions  of  the  elapse  time  t  are  shown  in 
Table  1.  The  signature  of  the  first  failure  is  simply  a  constant  vector.  The  first 
component  of  g2(t)  is  a  constant,  while  the  second  component  is  a  ramp.  We  have 
chosen  to  examine  these  those  types  of  signatures  because  they  are  simple  and 
describe  a  large  variety  of  failure  signatures  that  are  commonly  seen  in  practice.  For 
simplicity,  we  have  chosen  V,  the  covariance  of  r,  to  be  the  identity  matrix. 

Both  a  simplified  sliding  window  rule  (that  uses  Aw_j)  and  a  rule  using  the 
Markov  statistic  z  were  examined.  The  parameters  associated  with  Aw_j,  L,  and  z  are 
shown  in  Table  2,  and  the  cost  functions  and  the  prior  probability  are  shown  in  Table 
3.  To  facilitate  discussion,  we  introduce  the  following  terminology.  We  refer  to  a 
Monte  Carlo  simulation  of  the  sliding  window  rule  by  SW,  a  simulation  of  the  rule 
using  the  Markov  statistic  z  as  Markov  Implementation  (MI),  and  a  simulation  of  the 
non-implementable  decision  process  using  the  approximation  /  as  Markov 
Approximation  (MA).  (All  simulations  are  based  on  10,000  trajectories.)  The 
notation  Q20  refers  to  the  results  of  applying  the  quadrature  algorithm  to  calculate  the 
various  performance  indices  of  the  sliding  window  rule  while  using  I  to  approximate 
*w-i  (12). 

The  results  of  SW,  MA,  and  Q20  for  the  thresholds  [8.85,  12.05]  are  shown  in 
Figures  2-6  (see  (15)  for  the  definition  of  notation).  The  quadrature  results  Q20  are 
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very  dose  to  those  obtained  by  Monte  Carlo  simulations  for  MA,  indicating  the 
excellent  accuracy  of  the  quadrature  algorithm.  In  comparing  SW  with  MA,  it  is 
evident  that  the  Markov  approximation  slightly  under-estimates  the  false  alarm  rate  of 
the  sliding  window  rule  (SW).  However,  the  response  of  the  Markov  approximation 
to  failure  is  very  close  to  that  of  the  sliding  window  rule.  In  the  present  example, 
Aw_|  is  a  7-th  order  process,  while  its  approximation  /  is  only  of  first  order.  In  view 
of  this  fact  we  can  conclude  that  /  provides  a  very  reasonable  and  useful 
approximation  of  Aw_,. 

The  successive  choices  of  thresholds  by  SQP  for  the  sliding  window  rule  are 
plotted  in  Figure  7.  Note  that  we  have  not  carried  the  SQP  algorithm  so  far  that  the 
successive  choices  of  thresholds  are,  say,  within  .001  of  each  other.  This  is  because 
near  the  optimum  the  expected  risk  is  relatively  insensitive  to  small  changes  in  f. 
This  implies  that  fine  scale  optimization  is  not  generally  worthwhile.  This  conclusion 
is  supported  by  the  fact  that  the  residual  signature  models  used  in  designing  failure 
detection  systems  are  typically  idealizations,  and  thus  minor  improvements  in  Bayes 
risk  is  generally  an  artifact  of  the  mathematical  formulation.  Furthermore,  it  should 
be  remembered  that  the  use  of  the  Bayes  formulation  is  simply  for  the  purpose  of 
providing  a  mechanism  for  determining  high-performance  decision  rules,  and  thus  the 
precise  optimization  of  the  Bayes  risk  is  not  the  central  issue.  In  fact,  the  cost 
parameters  L,  c,  j*,  and  W  should  be  used  as  design  parameters.  In  the  event  that  the 
optimal  thresholds  resulting  from  a  particular  choice  of  Bayes  risk  do  not  provide  the 
desired  detection  performance,  the  design  parameters  may  be  adjusted  and  the  SQP 
may  be  repeated  to  get  a  new  design.  A  practical  alternative  method  is  to  make  use  of 
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the  list  of  performance  indices  (e.g.  P(ij))  that  are  generated  in  the  risk  calculation, 
and  choose  a  pair  of  thresholds  that  yields  the  desired  performance  tradeoff. 

The  performance  of  the  decision  rule  using  Aw_t  and  z  as  determined  by  SQP  are 
shown  in  Figures  8-12.  (The  thresholds  for  Aw-i  *re  (8.85,  12.05]  and  those  for  z  are 
[6.29,  11.69].)  We  note  that  MI  has  a  higher  false  alarm  rate  than  SW.  The  speeds  of 
detection  for  the  two  rules  are  similar.  While  MI  has  a  slightly  higher  type  1  correct 

t 

detection  probability  (2)b|(s|l))  than  SW,  SW  has  a  consistently  higher  type  2 

s-o 

i 

correct  detection  probability  (2b2(s|2)  than  MI.  By  raising  the  thresholds  of  the 

•HI 

rule  using  z  appropriately,  we  can  decrease  the  false  alarm  rate  of  MI  down  to  that  of 
SW  with  an  increase  in  detection  delay  and  slightly  improved  correct  detection 
probability  for  the  type  2  failure.  Thus  the  sliding  window  rule  is  slightly  superior  to 
the  rule  using  z  in  the  sense  that  when  both  are  designed  to  yield  a  comparable  false 
alarm  rate,  the  latter  will  have  longer  detection  delays  and  a  slightly  lower  correct 
detection  probability  for  a  type  2  failure.  In  view  of  the  fact  that  a  decision  rule  using 
z  is  much  simpler  to  implement,  it  is  worthy  of  being  considered  as  an  alternative  to 
the  sliding  window  rule. 

In  summary,  this  example  illustrates  the  utility  of  our  approach.  The  quadrature 
algorithm  has  been  shown  to  be  accurate  and  useful,  and  the  Markov  approximation 
of  Aw_j  by  /  is  a  valid  one.  The  simplicity  and  usefulnes  of  the  SQP  algorithm  have 
also  been  demonstrated.  Finally,  the  Markov  decision  statistic  z  has  been  shown  to  be 
a  worthy  alternative  to  the  sliding  window  statistic  Aw_t. 


5.  CONCLUSION 


A  computationally  feasible  methodology  based  on  the  Bayesian  approach  has  been 
developed  for  designing  suboptimal  sequential  decision  rules  for  FDI.  This 
methodology  was  applied  to  a  numerical  example,  and  the  results  indicate  that  it  is  a 
potentially  useful  design  approach. 
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APPKNDIX 

Theorem 

Consider  the  Gauss -Markov  process  Kk)  specified  by  (8)- (10)  in  Section  3.1.  r  ven 
E(Aw.i(k)A>w-i(k)}-20>  0,  A  has  at  least  one  unity  eigenvalue  and  r  is  semi¬ 
positive  definite  if  and  only  if  there  exist  M-vectors  a# 0  and  fl*0  such  that 


o'Gt  -  0'Gj+i,  i-0,...,W-2 


and 


a'G^_|  ■  0 


0*GO  -  0 


Proof 

Let 


m 

X 


i 


Aw-i  (k) 

Aw_j  (k+1) 


lA'w_|(k),A'w«|(k+l)] 


So  *t 
X'i  X0 


(Al) 

(A2) 

(A3) 


Using  the  transformation  T 


we  obtain 


0 

I 


TXT' 


X0  0 

o  r 


Since  T  is  full  rank  and  X0>0,  X  and  T  are  semi-positive  definite  if  and  only  if  there 
are  non-zero  M-vectors  a  and  fi  such  that 


«'A*_|(k+D  ■  jS'Af.j  (k) 


Recall 


(A4) 


Aw_,(k)  -  "j^G.V^rfr-W+l+s) 
s-e 


Therefore,  (A4)  is  equivalent  to  the  conditions  (A1)-(A3).  From  (8),  we  obtain 

Xo- A20A'  +  r 

It  follows  that  A  only  has  eigenvalues  of  magnitudes  less  than  or  equal  to  unity,  and  it 
has  at  least  one  unity  eigenvalue  if  and  only  if  I*  is  semi-positive  definite. 

Q.  E.  D. 

Suppose  all  signatures  vanish  for  elapse  times  greater  than  W-l,  i.e.  gi  (0-0,  for 
t>W-l,  and  i*l,...M.  Then,  (A1)-(A3)  are  equivalent  to  the  condition  that  it  is  not 
possible  to  distinguish  between  a  failure  occurring  at  a  certain  time  and  failures 
occurring  one  time  step  earlier  or  later.  Moreover,  (Al)- (A3)  indicate  that  only  a 
special  class  of  failure  signatures  would  satisfy  this  indistinguishability  condition  for  all 
value  of  W.  Generally,  it  is  possible  to  choose  a  sufficiently  large  W  so  that  this 


situation  is  avoided. 


Table  1.  Failure  Signatures. 
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2.32  2.01 
2.01  4.58 
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.875 
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Table  2.  Parameters  for  Aw_,,  £  and  z. 


c,  -  Ci  -  1 

L(l,2)  -  L (2,1)  -  10,  L(l,l)  -  L(2,2)  -  0,  Lp  -  9 
T  -  8,  Q  -  8,  p  -  .0002 
-  .5p(l-p)r"1,  i  -  1,2 


Table  3.  Cost  Functions  and  Prior  Probability. 
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Figure  5.  £ b,  (s  |  1 ) :  SW,  M  A,  end  Q20 


Figure  7.  Thresholds  Chosen  by  SQP 
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Figure  6.  j  b2  (s  |  2) :  SW,  M A,  and  Q20 


Figure  8.  b0(t  |  0):  SW  and  MI 
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